Skip to content

Conversation

@kazimuth
Copy link
Contributor

@kazimuth kazimuth commented Sep 12, 2023

Description of Changes

Rewrite benchmarks to allow direct comparison between:

  • spacetime_module: Spacetime via a module
  • spacetime_raw: Spacetime via the RelationalDB struct
  • sqlite

And add some targeted serialization benchmarks.

@kazimuth kazimuth force-pushed the kazimuth/benchwrangle branch 3 times, most recently from 62e0a3c to f797782 Compare September 15, 2023 18:39
@kazimuth kazimuth mentioned this pull request Sep 15, 2023
3 tasks
@kazimuth
Copy link
Contributor Author

benchmarks please

@kazimuth kazimuth force-pushed the kazimuth/benchwrangle branch from fef1058 to 73e591b Compare September 18, 2023 18:17
@kazimuth kazimuth force-pushed the kazimuth/benchwrangle branch from c3e555d to 303b332 Compare September 20, 2023 17:46
Copy link
Contributor

@kulakowski kulakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Inline is mostly minor comments about comments. I think this PR can land whenever, as far as I'm concerned, but I want two things.

One is the framework or schema or strategy or whatever you want to call it, by which you derived the set of benchmarks to write out. If Joshua (say) adds an Update operation that isn't just a delete+insert (say), I'd love for him to be able to read that framework and just slot in exactly the right set of benchmarks. I'd like this to be part of this PR.

Two, if there's a ways to go, I'd like other people to be able to see either in the code or in a ticket how far there is to go until this work is complete. I'm happy if there are more PRs following this one until you get there.

fn clear_table(&self, table_name: String) -> Result<(), anyhow::Error> {
let db = &*self.worker_database_instance.relational_db;
db.with_auto_commit(|tx| {
let tables = db.get_all_tables(tx)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could plumb through a table_by_name function?

let tables = db.get_all_tables(tx)?;
for table in tables {
if table.table_name != table_name {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also return once we find it. This probably can also be expressed as a call to find or whatever on tables.iter, if we don't do a table_by_name approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if there could be multiple tables with the same name here, can there be?

fn build(prefill: bool, fsync: bool) -> ResultBench<Self>
pub struct SQLite {
db: Connection,
_temp_dir: TempDir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finickiness of the TempDir drop stuff makes me want a comment here, we've had enough bugs in tests and CI about it that I don't want this refactored away to a local or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I think refactoring it to a local could cause the tempdir to be dropped, which might break things... then again, the earlier benchmarks version did that and the sqlite benches still worked. So idk.


if prefill {
prefill_data(&mut db, Runs::Small)?;
fn create_table<T: BenchTable>(&mut self, table_style: crate::schemas::TableStyle) -> ResultBench<Self::TableId> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mario or someone might be a better reviewer of the sqlite stuff here than me.

}

type PreparedFilter = PreparedFilter;
#[inline(never)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked this #[inline(never)] one arbitrarily out of all of them to comment: you should describe where and why you do this in some top level place if possible!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them all in in an attempt to improve compile times, but it didn't have a huge effect, because everything is still heavily monomorphizing due to the generics. I'll strip them out.


pub struct SpacetimeModule {
runtime: Runtime,
// it's necessary for this to be dropped BEFORE the runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer this (an explicit Drop impl) to e.g. relying on a comment about relying on the drop ordering based on field order. Maybe Mazdak or Noa have a stronger style opinion about it, I'd be curious.

@kazimuth
Copy link
Contributor Author

kazimuth commented Sep 25, 2023

@kulakowski, I'll address your comments and then I think this is ready for merge.

One is the framework or schema or strategy or whatever you want to call it, by which you derived the set of benchmarks to write out. If Joshua (say) adds an Update operation that isn't just a delete+insert (say), I'd love for him to be able to read that framework and just slot in exactly the right set of benchmarks. I'd like this to be part of this PR.

Good point, I'll document this.

Two, if there's a ways to go, I'd like other people to be able to see either in the code or in a ticket how far there is to go until this work is complete. I'm happy if there are more PRs following this one until you get there.

Yeah, I wasn't really sure where I was going while working on this which meant I had a hard time communicating what I was doing. I have some more targeted benchmark ideas which I'll stick in a followup PR.

I do have one remaining concern, which is benchmark time. Looking at the tests on this PR, it seems that running benchmarks on GitHub actions currently takes about 80 minutes. We could:

  • disable the in-memory benchmarks on GitHub actions, which would get rid of one axis of this giant hypercube. Alternatively, the sqlite benchmarks, since these won't change. I think both of those provide useful information, though.
  • Reduce benchmark warm-up and sampling times. This would increase variance.
  • Just accept that benchmarks are slow.

@kazimuth
Copy link
Contributor Author

kazimuth commented Sep 25, 2023

Oh, also, should I rename "Person" to "IntStringInt" and "Location" to "IntIntInt" or something similar? Having mysterious schema names in benchmark results might confuse people.

@kazimuth
Copy link
Contributor Author

I have a path to fix benchmark times -- it's because everything is being run twice, including the sqlite benchmarks. I'll mess with our Github Actions benchmark script to fix that.

@kazimuth kazimuth enabled auto-merge (squash) September 27, 2023 19:32
Copy link
Contributor

@mamcx mamcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

These calls unconditionally panicked before anyway;
now they just panic in a way that Clippy doesn't object to.
@kim kim mentioned this pull request Sep 29, 2023
4 tasks
@kazimuth kazimuth merged commit 010c7e3 into master Sep 29, 2023
bfops pushed a commit that referenced this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants